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Abstract 

In this paper we analyze the average queue lengths in a combined input-output 
queued switch using a maximal size matching scheduling algorithm. We compare these 
average queue lengths to the average queue lengths achieved by an optimal switch. 
We model the cell arrival process as independent and identically distributed between 
time slots and uniformly distributed among input and output ports. For switches with 
many input and output ports, the backlog associated with maximal size matching with 
speedup 3 is no more than 3^ times the backlog associated with an optimal switch. 
Moreover, this performance ratio rapidly approaches 2 as speedup increases. 

1 Introduction 

Although packet switches vary in their internal construction, the most common architec- 
ture for high performance switches is the crossbar switch. A crossbar switch contains N 
input lines and N output lines, where each input line meets each output line at a cross- 
point. This is depicted in Figure 1. When a crosspoint connecting and input line and an 
output line is closed, cells may be transferred between this input and output. Crossbar 
switches operate with the constraint that, when routing cells from inputs to outputs, each 
input may only be connected to a single output, and each output may only be connected 
to a single input. 

Switches arc generally analyzed under a model where time is slotted, and only one cell 
may arrive at each input and depart from each output per time slot. Each arriving cell 
has a destination output port to which it must eventually be sent. Since multiple cells 
with the same output destination may arrive simultaneously at the input ports, switches 
require some form of buffering to store the cells which can not be immediately output. 
Buffered crossbar switches vary in their architecture, with the simplest being the output 
queued switch. In an output queued switch, buffers are placed at each output. All arriving 
cells are placed in their respective output queues in each time slot, and the output queues 
are served on a first come-first served (FCFS) basis. The average queue backlogs achieved 
by output queueing are minimum among all buffered crossbar switches. However, for a 
switch with TV inputs and N outputs, output queueing requires that the as many as N 
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Figure 1: A depiction of a crossbar switch, where input lines arc connected to output 
lines by closing crosspoints. 



rounds of scheduling must be performed by the switch in each time slot (consider the case 
when N cells destined for a single output arrive simultaneously). It is this requirement 
that makes output qucueing infeasiblc for switches with many input and output ports. 

Another alternative is the input queued switch, where cells are placed in buffers at the 
input ports in which they arrive. Delay performance of an input queued switch is heavily 
dependent on the service discipline used to serve the queues. It was shown in that 
if input queues are served FCFS, the switch can only achieve 58% throughput. That is, 
suppose cells destined for output j arrive at input i at an average rate of X/N per time slot 
for all Then average backlogs are bounded for all A < 1 under output queueing, but 
average backlogs are only bounded for all A < 0.58 when input queues are served FCFS. 
However, if we use a service discipline which schedules cells in each input queue based 
on their destinations, it is possible to achieve 100% throughput with input queueing. 
In particular, it was shown in [7] that 100% throughput is achieved by using a service 
discipline based on constructing maximum weight matchings (MWM) between inputs and 
outputs in each round of scheduling. This has the advantage over output queueing that 
only one round of scheduling is required per time slot. However, the algorithms required 
for computing maximum weight matchings are computationally expensive to implement. 
Also, the only known bounds on the average backlog under MWM are 0(N 2 ) [H], as 
opposed to output queueing which has an average backlog which increases as 0(N). 

Combined input-output queued (CIOQ) switches are an alternative to purely input 
queued or purely output queued switches. Combined input-output queued switches place 
buffers at both the input ports and the output ports, and perform some moderate num- 
ber s <C N rounds of scheduling per time slot. The number of rounds of scheduling s is 
commonly referred to the speedup of the switch. It was shown in 0] that 100% through- 
put can be achieved by using speedup s = 2 and a simple service discipline based on 
greedily constructing maximal size matchings between input ports and output ports in 
each round of scheduling. Unlike pure input queueing with MWM scheduling, maximal 
size matching schedules can be computed with low computational cost. Also, unlike pure 
output queueing, the speedup requirements do not increase with the size of the switch. 

The purpose of this paper is to show that average backlog performance of a CIOQ switch 
using maximal matching scheduling with low speedup is comparable to that of an output 
queued switch. Several previous papers have addressed the problem of analyzing backlogs 
in combined input-output queued switches with speedup. In [2], it was shown that under 



2 



any traffic, an output queued switch can be exactly emulated by a CIOQ switch operation 
with speedup 2. However, the queueing discipline used in each round of scheduling has 
quite high computational cost. In [Sj, an upper bound on average backlog was proven for 
maximal matching scheduling with speedup 2 assuming IID Bernoulli traffic with uniform 
loading on input and output ports. Unlike the best known bound for MWM scheduling, 
the ratio between this upper bound and a lower bound on the backlog for an output 
queued switch is constant as N increases. However, this ratio becomes arbitrarily large 
as the arrival rate A approaches 1 . The same problem was considered and another upper 
bound on backlog was computed in [§]. There it was shown that the average backlog 
associated with maximal matching with speedup 2 is no more than 5 times the backlog 
associated with an output queued switch. In this paper we also consider switches under 
uniformly loaded IID traffic. We show that average backlog associated with maximal 
matching with speedup s gets arbitrarily close to 2 times the backlog associated with 
an output queued switch as s increases. Specifically, for a switch with many input and 
output ports, we show that for for speedup s — 3, the backlog associated with maximal 
matching with speedup 3 is no more than 3| times the backlog associated with an output 
queued switch. This performance ratio rapidly approaches 2 as s increases. 

2 Preliminaries 

2.1 Maximal Size Matchings 

Performing a round of scheduling in a crossbar switch can be thought of as constructing 
a matching in a bipartite graph G. This is shown in Figure 2. The vertices in G represent 
input and output ports, and there is an edge between vertices i and j if the queue at input 
port i contains a cell to be sent to output port j. Scheduling corresponds to choosing a 
collection of edges in G. That is, edge is chosen if a cell is to be sent from input port 
i to output port j. The connectivity constraint imposed by the crossbar requires that the 
scheduled transfers correspond to a matching in the graph. A matching is a subgraph of 
G with the defining property that no two edges are incident on the same vertex. 




Figure 2: A bipartite graph, where the heavy lines show a maximal size matching. 

Scheduling algorithms for input and combined input-output queued switches essentially 
amount to various criteria for selecting matchings. In this paper we consider maximal size 
matchings. The main advantage to scheduling using maximal size matchings is that these 
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matchings can be computed very efficiently using a simple greedy algorithm. A maximal 
size matching is a subgraph H C G with the property that if we add any edge in G — H to 
H, then H is no longer a matching. The key property of maximal size matchings which 
is used in our later proofs is that if edge is in G, then there is an edge in H incident 
to either vertex i or vertex j. 

2.2 The Markov Chain Switch Model 

Here we assume a traffic model in which at most one cell may arrive at each input in a 
single time slot, and that cell arrivals at all time slots are independent and identically 
distributed. We let Aij(t) S {0,1} be the random variable giving the number of cells 
arriving at input i destined for output j in time slot t. For simplicity, here we consider 
the case where arrivals are uniformly and independently distributed across inputs and 
outputs. This implies that the first and second moments of Aij(t) are 

E[A l3 (t)] 
EiAijit) 2 } 

E[A kj (t)Aij(t)] 
E[A ik (t)A u (t)} 

where < A < 1 is a parameter describing the traffic intensity. Let -Dy(t) £ {0, . . . , s} 
denote the number of cells sent from input queue i to output queue j in time slot t, and 
let Ej(t) € {0, 1 } be the number of cells served from output queue j in time slot t. Also, 
we let Xij{t) denote the number of cells in input queue i destined for output j in time 
slot t, and let Yj(t) denote the number of cells in output queue j in time slot t. These 
random variables satisfy 

Xi^t + 1) = X i:j {t) + A i:j (t) - Dij(t) 

N 

Yj(t+1) = Y^ + YsDi^-Ejit). 

i=l 

Throughout this paper, we will occasionally write these quantities in lowercase when 
simply referring to feasible values that they may take. 

We will consider the problem of controlling the system to regulate the steady-state 
average per-period backlog in the input and output queues, 

X(0),Y(0) • 

Under a maximal matching scheduling policy, D(t) and E(t) depend only on X{t) and 
Y(t). When this is the case, this system evolves as a Markov chain and we can use the 
following lemma to bound the average per-period backlog. This lemma is a special case 
of a more general result shown in 0. Results similar to the lemma below also appear, 
for example, in [S]. 
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Lemma 1. Consider a Markov chain X with state space X . For the cost function r : 
X -> M, let 

t 



J{z) = lim — - ]T E[r(X(k))\X(0) = z). 

t— too t + 1 ' — ' 



fc=0 

For any hjj : X — > K such that inf {hjj(x)} > — oo, 



J(z) < sup {r(x)+E[hu(X(t+l))\X(t) = x] - h v {x)} 



for all z £ X . 



Proof. Let y = M xeX {hu{x)}, 

A v (x) = E[hu(X(t + l))\X(t) = x] - h v (x), 

and 



fa = sup{r(x) + A(x)}. 



For all t > 



^^£;[r(X(fc))|X(0)] = ^ T ^£;[r(X(fc)) + A c/ (X(fc))|X(0)] 



fc=0 fc=0 

hu(X(0))-y E[hu(X(t+l))\X(p)]-y 



< Pu + 



t+1 t+1 

hu(X(0)) - y 
t+l 



Clearly 

for all z G X, hence 



]im hu(X m -y =Q 

t^co t+1 



1 * 

lim t-tE^^W)^^)] </3l 



fe = 



3 Main Result 

Our overall goal is to show that maximal matching scheduling with a speedup of s keeps 
the average backlog are relatively close to the backlog achieved by an output queued 
switch. Specifically, we will: (i) compute an upper bound on the backlog associated with 
maximal matching with speedup s, (ii) compute the backlog associated with an output 
queued switch, and (hi) compute a bound on the ratio of these quantities. 

Our first step will be to compute the upper bound on the average backlog. The following 
lemma will be used in the proof of the bound. Here we will define the quantities 

1 

ai(s) 



(s -1) 2 -1' 
s - 1 

02(S) = 0,-1)2-1 ' 
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and 

(N N \ 

a i{ s )^2 d lj + e,j + a 2 {s)^2d u 
1=1 1=1 ) 

which will be used throughout the rest of this paper. 

Lemma 2. For a CIOQ switch operating at speedup s > 3 using maximal matching 
scheduling, 

Qij(X,d,e,s)xtj < (A - l)x i3 
for all X < 1 and all feasible values of Xij, d, and e. 

Proof. When operating at speedup s, s rounds of scheduling occur in each time slot. 
When using maximal matching scheduling, if there is a cell in input queue i destined for 
output j at the start of a round of scheduling, then either a cell is removed from input i 
or a cell is sent to output j in that round. Also, if a cell is sent to output queue j, then 
output queue j is served at the end of the time slot. 

It is clear that the lemma holds if x%j = 0. To prove the lemma for Xij > 0, we will 
consider three cases: 

1. When J2i=i d ij = s an d Z)j=i d a = 0, 

N N 



ai ( S ) ^ % + e 3 + a i{ 8 ) ^ djl = 0£i(s)s+l 

s-1 + 1 



1=1 1=1 



(S- 1) 2 -1 

1 + a\(s) + a 2 (s). 



2. When Y^fLi d ij = and Y^iLi d U = s - 

N N 

Oii(s)*^2dij + e 3 + a2(s)*^2dii > a 2 {s)s 
1=1 1=1 

(s - - 1 + 1) - 1 + 1 
(s- 1) 2 -1 
= 1 + cti(s) + a 2 {s). 

3. When J2i=i dij > and X)zl=i dil > 0, it is sufficient to consider the case where 
YllLi dij = 2j=i d"U = 1 since ai(s) > and a2(s) > 0. In this case, 

N N 

a i( s ) X! ^ij + e i + a 2(s) y~]dg = 1 + ax(s) + a 2 (s). 
i=i i=i 



Note that if iy > and 



N N 



^ ^ Xij + ^ ^ Xij <C S, 
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then the total number of cells cither in input queue i or destined for output queue j is 
less than s. However, in this case at least one cell must be sent from input queue i to 
output queue j, implying that > an< ^ X)f=i d>u > 0- ■ 



Now we are ready to prove the upper bound on the backlog associated with maximal 
matching scheduling. We will let Jmms denote the average per-period backlog associated 
with maximal matching scheduling with speedup s. 

Theorem 3. A CIOQ switch operating with speedup s using a maximal matching schedul- 
ing policy has average per-period backlog satisfying 

^ (his) (1 - i) A 2 + k 2 (s)X - k 3 (s)X 2 ) N 



2(1 -A) 



where 



fci(s) = l + «i(s) 

fca(s) = 2 + (a 1 (s)+a 2 (s))(s + l) 

k 3 (s) = 2 + 2a l (s)+2a 2 (s). 

Proof. We prove this bound using Lemma ^ with 

hu(x,y) = hi(x) + h 2 (x) + h 3 (x,y), 



where 



, v N (( N \ 2 JV 



h 



N 



I / M \ 



2(1 -A) ^ 

V ' 2— 1 



AT \ AT 



J=l / 3=1 



Since /i;/ is quadratic with positive second order coefficients, it is clear that 

M{hu(x, y)} > -oo, 

satisfying the required condition of Lemma ^ Let 

Ai{x, y, d) = E[hi(X{t + 1), Y(t + 1)1*, y, d] - hi(x, y) 
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denote the expected drift in hi when in state (x,y) and action d is taken. 



Ai(x,y,d) = ^ x f(E 

3=1 V 



JV 



(=1 



E : 



2(1 - A) 



jy 



E^ 

3=1 



' JV \ : 

E^« ~ 



ai(s)(s - 2A) 
2(1 -A) 



JV 



3=1 

JV JV / JV 



JV 



^(Ay-dy) 



;=i 



- «i(s)^^/ A \ f(l-^)A 2 + (s+l)A-2A 2 

i=i j=i V (=1 / 



2(1 -A) 



where we used the fact that 



E 



' JV 

E^ 



U=l 



JV JV 



fe=l z=l 



Similarly, 



A a (*,y,d) = ^E(^ 



JV 



JV 



z=i 



JV 
3=1 



"(2) 
2(1 - A) 



E^ 



JV 



| a 2 (g)(g-2A) ^ £ 



2(1 -A) 



JV \ ' 

5^(Aa - d a ) 
i=i ) 

JV 

^(Au - d u 



* ^ £ aT EE^-E^)^ + ^) 



JV JV 



/ = 1 



((s + l)A-2A 2 )^ 
2(1^ 



where we used the fact that 



E 



JV 



5> 



JV JV 

fe=i ;=i 
A. 
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Also, 



A. 



TV 



1 W / W 
2( T^A)E i? (E 



■ JV 



1 - 2A 



E A ^ - e J 



JV 



=1 

JV 



-j — x E ,:A e ^ E s « + w 



((l-£)A 2 + 2A-2A 2 )JV 
2(1 -A) 



Therefore, 



(^dj+EA^d) < EE(i+ Qij( 1 M ' e,s) ) 

i=l i=l 7=1 ^ ' 



i=l j=l 
JV 



A - 



+ 



(frfr) (1 ~ ^) A 2 + fc 2 (s)A - fc 3 (g)A 2 ) A 
2(1 -A) 



From Lemma |21 we have 



JV JV 



= 1 3=1 



EE i + ^t 



Qij(A,d,e,s) 



A 



xy < 



for all values of x. Also, since ej = 1 if yj > 0, 



for all values of y. Therefore, 

JmMs = sup{r(x,y) + A(x,y, d)} 

x ,y 

(k^) (1 - i) A 2 + fc 2 (s)A - fc 3 (s)A 2 ) A 



< 



2(1 -A) 



The previous theorem established an upper bound on Jmms, the average per-period back- 
log under maximal matching with some fixed speedup s. We would now like to determine 
the expected per-period backlog associated with an output queued switch, which we will 
denote by Joq- This is a standard result, but is presented here to keep our treatment 
self-contained. 
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Lemma 4. An output queued switch has 

((l-i)A 2 + 2A-2A 2 )7V 



J 



OQ 



2(1 -A) 



Proof. Output queue j is a discrete-time queue with queue with arrival process A\j + 
■■■ + Anj. By the Pollaczek-Khintchine formula, (see, for example, [S]) the average 
steady-state per-period backlog of output queue j is 



E 



+ A - 2A 2 



2(1 -A) 



Using the fact that 



E 



' N 

E^ 

\i=i 



N N 



*;=! i=i 



we sum over all output queues to obtain 

((1-^)A 2 + 2A-A 2 )7V 



JOQ 



2(1 -A) 



The upper bound and the result of the previous lemma are now used to determine a 
bound on the performance ratio between maximal matching and output qucucing. 

Theorem 5. The ratio of the average backlog under maximal matching scheduling with 
speedup s > 3 to the average backlog of an output queued switch satisfies 



Jmms 

JOQ 



< 



N 



N-1 



2(s-l) 2 + (s-l) 
( S -l) 2 -l 



(1) 



Proof. From Theorem [3] and Theorem 0] we have 

Jmms Mf) (1 - j?) A + k 2 {s) - k 3 (s)X 



JOQ 



2-(l + i)A 



By differentiating, it is straightforward to show that for s > 3 and N > 2, the previous 
expression is increasing in A for < A < 1. Therefore, 



Jmms 

JOQ 



< 



h(s)(l-±)+k 2 (s)-k 3 (s) 



N 



1 + ai {s)s + a 2 (s)(s - 1) - (1 + ai(s)U 



JV 



1-^ 



< 



N 



■(1 + ai(s)s + a 2 (s)(s - 1)) 



N - 1 

TV /2(s-l) 2 + (s-l) 



N - 1 



Is - 1) 



1 
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For large N, the performance ratio approaches 2 as s increases. Table [3] shows the value 
of this ratio for several low values of speedup. 



s 


3 


4 


5 


8 


15 


•/MMs/</oQ 


3.36 


2.65 


2.42 


2.20 


2.10 



Table 1: Values of at several low values of speedup for a 128 x 128 switch. 
4 Conclusions 

In this paper we have analyzed the average backlogs in network switches using a maximal 
size matching scheduling policy with speedup. It is shown that switches using maximal 
matching with speedup achieve backlogs comparable to an optimal switch. For the sake 
of simplicity, we have focused on the case of IID arrivals with uniform loading on input 
and output ports. We believe that the performance bounds proven in this paper can be 
tightened when arrivals are time correlated, and this is a subject of future research. 
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